This directory contains the replication package for the paper "A robot-assisted pipeline to rapidly scan 1.7 million historical aerial photographs". 

Code is stored in the directory "code"; data are stored in the directory "data"; output figures are stored in "graphs"; the protocol given to labelers of the calibration targets is stored in "protocols".  

Data are as follows:

production_daily.dta: daily production statistics during the project period. Originally recorded in Excel and converted to Stata format, removing annotations. 
production_weekly.dta: weekly production statistics during the project period. Originally recorded in Excel and converted to Stata format, removing annotations. 
image_group_bar.csv: records the implied smallest distinguishable feature, obtained by inspecting the groups of bars of different line widths in the resolution target and identifying the last group of bars that can be clearly distinguished and correctly counted. See the detailed description in protocols/Calibration target labeling instructions.pdf. 
pixel_length.csv: records the length, in pixels, of measuring scales across 121 randomly selected images. See the detailed description in protocols/Calibration target labeling instructions.pdf. 
tonal_values.csv: records the pixel value of each segment of the tonal step wedge across 121 randomly selected images. See the detailed description in protocols/Calibration target labeling instructions.pdf. 
Altitude and focal length.csv: records the altitude and focal length from a sample of 123 images where such information is available. 
weights.xlsx: records the weights for each region (country, region or island) for the archive, calculated by using the total number of boxes per region by the total number of boxes

To replicate the calculations described in "Production statistics", run the STATA do file code/production.do. This code uses production_daily.dta and production_weekly.dta as inputs. 

To replicate the calculations described in "Cost-effectiveness threshold scale" and Figure S8, run the STATA do file code/break_even.do. This produces the figure break_even.png, included in the supplementary materials as Figure S8. All required assumptions are documented in the code. 

To replicate Figures S3-S4, run the R script code/altitude_scale_distribution.R. This produces the figures hist_h.png and hist_h_weighted.png (Figure S3 a) and b)), and hist_dens.png and hist_weighted (Figure S4 a) and b)), included in the supplementary materials. Additionally, it produces histograms of focal length, not included in the paper. The directories in the beginning of the script need to be changed in order to be run.

To replicate Figures S5-S7, run the R script code/plots_tonal_value_length.R. This produces the figures pixel_length_histogram_lines.pdf (Figure S5), grayscale_value_boxplot.pdf (Figure S6), and microns_histogram.pdf (Figure S7), included in the supplementary materials. The directories in the beginning of the script need to be changed in order to be run.
